Neural Networks
Interpretability in a Data Driven World
Sometimes you do not care why a decision was made. In other cases, knowing the ‘why’ is important.
Human desire to find meaning in the world.
Performance vs opaque models.
ML models pick up biases from the training data.
ML can be debugged and audited if interpretable.
Verify if the accuracy results come from artifacts: validation.
Important in health and social sciences: accountability.
Exploration and analysis in the sciences: extract insights from complex systems.
The degree to which a human can understand the cause of a decision or can consistently predict the model’s result.
Models inherently interpretable: linear regression, logistic regression, decision tree…
Lower predictive performance in comparison to other machine learning models.
However, insights are hidden in increasingly complex models.
Interpreting deep networks remains a young and emerging field of research.
Numerous coexisting approaches.
We will present some possible techniques.
Focus on supervised learning.
Subset selection:
Shinkage:
Dimension Reduction:
Permutation feature importance
Permuted the feature’s values: breaks relationship between feature and outcome.
A feature is “important” if shuffling its values increases the model error, because in this case the model relied on the feature for the prediction.
What features has the neural network learned?
Activation maximization (AM): the input that maximizes the activation of that unit.
Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, Christoph Molnar (2023).
Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, Christoph Molnar (2023).
Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, Christoph Molnar (2023).
Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, Christoph Molnar (2023).
Methods for interpreting and understanding deep neural networks, Montavon et al (2018).
pacman::p_load(NeuralNetTools, tidyverse,
nycflights13, nnet)
tomod <- flights %>%
filter(month == 12 & carrier == "UA") %>%
select(arr_delay, dep_delay, dep_time,
arr_time, air_time, distance) %>%
mutate_each(funs(scale), -arr_delay) %>%
mutate_each(funs(as.numeric), -arr_delay) %>%
mutate(arr_delay = scales::rescale(arr_delay, to = c(0, 1))) %>%
data.frame()
mod <- nnet(arr_delay ~ ., size = 5,
linout = TRUE, data = tomod,
trace = FALSE)
plotnet(mod)NeuralNetTools: Visualization and Analysis Tools for Neural Networks, Marcus W. Beck (2018).
Lloyd Shapley
It is contextualized in a cooperative game with \(N\) agents in a coalition. Each agent has only two choices, cooperate or not cooperate. Therefore the number of possible coalitions is \(2^{N}\).
The coalition is a subset of the set of agents \(N\) and is represented by \(S\). The set of all possible coalitions is represented by \(\mathcal{P}(N) \Rightarrow S \in \mathcal{P}(N)\).
The function \(v: S \rightarrow \mathbb{R}\), assigns to the coalition \(S\) a value that corresponds to the sum of the expected payoffs that the members of the coalition can obtain.
\[ \varphi_{i}(v) = \frac{1}{n} \sum_{S \subseteq N / \{i\}} \binom{n-1}{|S|}^{-1} (v(S \cup \{ i \}) - v(S)) \]
\[ \begin{align} & \varphi_{i}(v) = & \\ & \frac{1}{number \space of \space agents} \sum_{coalition \space that \space excludes \space i} \frac{marginal \space contribution \space of \space i \space for \space this \space coalition}{number \space of \space coalitions \space that \space exclude \space i \space with \space this \space size} \end{align} \]
Explaining a linear regression model
Linear regression model on the California housing dataset.
20,640 blocks of houses across California in 1990, where our goal is to predict the natural log of the median home price from 8 different features:
shap as a method for quantifying the contribution of each data point on the training set of a convolutional neural network.Disclaimer: Don’t invest using candlesticks charts!!!
shap library from Python, the Shapley values were calculated for each pixel in every image in the training set.Explaining the Predictions of Any Classifier.” ArXiv.org, 16 Feb. 2016, arxiv.org/abs/1602.04938.
Explain what led the ML model to give a certain prediction to an instance
What variables affected the decision?
Observe the behavior of the ML model with points around the instance of interest
Simulate points in the neighborhood
Create an interpretable surrogate model to explain the behaviour of the ML model around the instance
“Understanding LIME | Explainable AI.” Youtube, www.youtube.com/watch?v=CYl172IwqKs.
Select an instance of interest
“Understanding LIME | Explainable AI.” Youtube, www.youtube.com/watch?v=CYl172IwqKs.
Generate new instances around the original instance and calculate their result with the ML model
“Understanding LIME | Explainable AI.” Youtube, www.youtube.com/watch?v=CYl172IwqKs.
Apply a weight to the new instances depending on the distance to the original instance
“Understanding LIME | Explainable AI.” Youtube, www.youtube.com/watch?v=CYl172IwqKs.
Obtain a new interpretable model with the weighted instances
“Understanding LIME | Explainable AI.” Youtube, www.youtube.com/watch?v=CYl172IwqKs.
Interpret the local model
\(L(f,g,π_x)\) is a loss function, usually MSE, between the prediction of the ML model and the prediction of the simplified model
\(\pi_x(z)\) is a function that weights the new instances according to their distance to the original instance
\(\Omega(g)\) penalizes high complexity models
Can be used for classification or regression models
Tabular data
Text or Image data
The downloaded binary packages are in
/var/folders/rp/h9_9qkdd7c57z9_hytk4306h0000gn/T//RtmptM1Nt5/downloaded_packages
The downloaded binary packages are in
/var/folders/rp/h9_9qkdd7c57z9_hytk4306h0000gn/T//RtmptM1Nt5/downloaded_packages
The downloaded binary packages are in
/var/folders/rp/h9_9qkdd7c57z9_hytk4306h0000gn/T//RtmptM1Nt5/downloaded_packages
We shall analyse the “Biopsy” database, which is part of the “MASS” package
“LIME | Machine Learning Model Interpretability Using LIME in R.” Analytics Vidhya, 18 Jan. 2021, www.analyticsvidhya.com/blog/2021/01/ml-interpretability-using-lime-in-r/.
## 75% of the sample size
smp_size <- floor(0.75 * nrow(biopsy))
## set the seed
set.seed(123)
train_ind <- sample(seq_len(nrow(biopsy)), size = smp_size)
train_biopsy <- biopsy[train_ind, ]
test_biopsy <- biopsy[-train_ind, ]
model_rf <- caret::train(class ~ ., data = train_biopsy,method = "rf", #random forest
trControl = trainControl(method = "repeatedcv", number = 10,repeats = 5, verboseIter = FALSE))“LIME | Machine Learning Model Interpretability Using LIME in R.” Analytics Vidhya, 18 Jan. 2021, www.analyticsvidhya.com/blog/2021/01/ml-interpretability-using-lime-in-r/.
Random Forest
512 samples
9 predictor
2 classes: 'benign', 'malignant'
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 5 times)
Summary of sample sizes: 461, 460, 461, 460, 461, 461, ...
Resampling results across tuning parameters:
mtry Accuracy Kappa
2 0.9765460 0.9490151
5 0.9706712 0.9362207
9 0.9659729 0.9259912
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was mtry = 2.
“LIME | Machine Learning Model Interpretability Using LIME in R.” Analytics Vidhya, 18 Jan. 2021, www.analyticsvidhya.com/blog/2021/01/ml-interpretability-using-lime-in-r/.
“LIME | Machine Learning Model Interpretability Using LIME in R.” Analytics Vidhya, 18 Jan. 2021, www.analyticsvidhya.com/blog/2021/01/ml-interpretability-using-lime-in-r/.
Difficulties definying the neighborhood
Dificulties with non-linearity
Incorrect sampling for new instances can cause improbable instances
Explanations to close points may vary greatly
Easily manipulable to hide biases
Alvarez-Melis, David, and Tommi S. Jaakkola. “On the robustness of interpretability methods.” arXiv preprint arXiv:1806.08049 (2018).
Slack, Dylan, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju. “Fooling lime and shap: Adversarial attacks on post hoc explanation methods.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pp. 180-186 (2020).
To Build Truly Intelligent Machines, Teach Them Cause and Effect, Judea Pearl interview.
To Build Truly Intelligent Machines, Teach Them Cause and Effect, Judea Pearl interview.
To Build Truly Intelligent Machines, Teach Them Cause and Effect, Judea Pearl interview.
To Build Truly Intelligent Machines, Teach Them Cause and Effect, Judea Pearl interview.
To Build Truly Intelligent Machines, Teach Them Cause and Effect, Judea Pearl interview.
To Build Truly Intelligent Machines, Teach Them Cause and Effect, Judea Pearl interview.
Judea Pearl and Dana Mackenzie. The book of Why: The New Science of Cause and Effect. Basic Books, 2018.
Judea Pearl and Dana Mackenzie. The book of Why: The New Science of Cause and Effect. Basic Books, 2018.
Judea Pearl and Dana Mackenzie. The book of Why: The New Science of Cause and Effect. Basic Books, 2018.
Judea Pearl and Dana Mackenzie. The book of Why: The New Science of Cause and Effect. Basic Books, 2018.
Christoph Molnar. Interpretable Machine Learning, 2023.
A counterfactual explanation of a prediction describes the smallest change to the feature values that changes the prediction to a predefined output.
How get a predicted Good credit risk with probability larger than 50% (against 24.2%)?
Christoph Molnar. Interpretable Machine Learning, 2023.
Christoph Molnar. Interpretable Machine Learning, 2023.
Tanmayee Narendra, Anush Sankaran, Deepak Vijaykeerthy, Senthil Mani. Explaining Deep Learning Models using Causal Inference, 2022.
Tanmayee Narendra, Anush Sankaran, Deepak Vijaykeerthy, Senthil Mani. Explaining Deep Learning Models using Causal Inference, 2022.
Tanmayee Narendra, Anush Sankaran, Deepak Vijaykeerthy, Senthil Mani. Explaining Deep Learning Models using Causal Inference, 2022.
The three core strengths of deep learning for casual learning are:
Zizhen Deng, Xiaolong Zheng, Hu Tian, Daniel Dajun Zeng. Deep Causal Learning: Representation, Discovery and Inference, 2022.
Figure 8. Source: Zizhen Deng, Xiaolong Zheng, Hu Tian, Daniel Dajun Zeng. Deep Causal Learning: Representation, Discovery and Inference, 2022.
Deep Causal Learning example. Real model: 𝐸(𝑋3|𝑑𝑜(𝑋2 = 𝑥)) = −1.1𝑥 + 84. Source: Zizhen Deng, Xiaolong Zheng, Hu Tian, Daniel Dajun Zeng. Deep Causal Learning: Representation, Discovery and Inference, 2022.
Source: Bodendorf, Sauter, and Franke, 2023.
Bodendorf, Sauter, and Franke. A mixed methods approach to analyze and predict supply disruptions by combining causal inference and deep learning. 2023. International Journal of Production Economics, Volume 256.
Frank Bodendorf, Maximilian Sauter, Jörg Franke. A mixed methods approach to analyze and predict supply disruptions by combining causal inference and deep learning. 2023. International Journal of Production Economics, Volume 256.
Bodendorf, Sauter, and Franke. A mixed methods approach to analyze and predict supply disruptions by combining causal inference and deep learning. 2023. International Journal of Production Economics, Volume 256.
Bodendorf, Sauter, and Franke. A mixed methods approach to analyze and predict supply disruptions by combining causal inference and deep learning. 2023. International Journal of Production Economics, Volume 256.
This is the tip of the iceberg!
Thank you!!!